Dutch Sublanguage Semantic Tagging combined with Mark-Up Technology
نویسندگان
چکیده
In this paper, we want to show how the morphological component of an existing NLP-system for Dutch (Dutch Medical Language Processor DMLP) has been extended in order to produce output that is compatible with the language independent modules of the LSP-MLP system (Linguistic String Project Medical Language Processor) of the New York University. The former can take advantage of the language independent developments of the latter, while focusing on idiosyncrasies for Dutch. This general strategy will be illustrated by a practical application, namely the highlighting of relevant information in a patient discharge summary (PDS) by means of modern HyperText Mark-Up Language (HTML) technology. Such an application can be of use for medical administrative purposes in a hospital environment.
منابع مشابه
Interacting Semantic Layers of Annotation in SoNaR, a Reference Corpus of Contemporary Written Dutch
This paper reports on the annotation of a corpus of 1 million words with four semantic annotation layers, including named entities, coreference relations, semantic roles and spatial and temporal expressions. These semantic annotation layers can benefit from the manually verified part of speech tagging, lemmatization and syntactic analysis (dependency tree) information layers which resulted from...
متن کاملFrom D-Coi to SoNaR: a reference corpus for Dutch
The computational linguistics community in The Netherlands and Belgium has long recognized the dire need for a major reference corpus of written Dutch. In part to answer this need, the STEVIN programme was established. To pave the way for the effective building of a 500-million-word reference corpus of written Dutch, a pilot project was established. The Dutch Corpus Initiative project or D-Coi ...
متن کاملCombining Independent Knowledge Sources for Word Sense Disambiguation
Disambiguation Yorick Wilks and Mark Stevenson Department of Computer Science, University of She eld, Regent Court, 211 Portobello Street, She eld S1 4DP, UK fyorick, [email protected] Abstract Sense tagging, the automatic assignment of the appropriate sense from some lexicon to each of the words in a text, is a specialised instance of the general problem of word sense disambiguation. We di...
متن کاملInformation extraction from non-segmented text (on the material of weather forecast telegrams)
Both the domain and sublanguage specific approach to text analysis and information extraction is proposed. Texts under consideration are weather forecast telegrams written in Russian. Telegrams are an example of deviant text type, with lack of text segmentation means, a lot of abbreviations, syntactic and spelling mistakes. The presented work pursues the problem of text segmentation: a procedur...
متن کاملCornetto: A Combinatorial Lexical Semantic Database for Dutch
One of the goals of the STEVIN programme is the realisation of a digital infrastructure that will enforce the position of the Dutch language in the modern information and communication technology. A semantic database for Dutch is a crucial component for this infrastructure for three reasons: (1) it enables the development of semantic web applications on top of knowledge and information expresse...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 1997